Rank | Count | Beginning |
---|---|---|
52707 | 7707 | Le |
45160 | 5349 | La |
60292 | 5105 | Les |
35326 | 4389 | Il |
1 | 2744 | « |
27817 | 2608 | En |
21 | 1983 | A |
9347 | 1829 | Ce |
95278 | 1802 | Une |
70134 | 1784 | Mais |
82401 | 1783 | Pour |
94702 | 1748 | Un |
31309 | 1598 | Et |
19748 | 1390 | Dans |
77316 | 1388 | On |
13128 | 1348 | C’est |
42630 | 1302 | Je |
26388 | 1059 | Elle |
23204 | 1051 | Dès |
16179 | 1020 | Cette |
4863 | 1005 | Au |
75555 | 948 | Nous |
39157 | 947 | Ils |
21575 | 846 | De |
13132 | 807 | C'est |
88679 | 803 | Selon |
89974 | 783 | Si |
3500 | 710 | Après |
12204 | 621 | Ces |
7074 | 511 | Avec |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV